-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Support for out-of-source quantizers #3534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks! This is ok, but we'll need to confirm if this might cause some issues downstream |
|
@rolandtannous Can you check if this causes any issues? Thanks |
|
@gemini-code-assist review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a flexible way to apply custom quantization schemes by exposing _prepare_for_qat as a replaceable method on FastLlamaModel and FastModel. This is a great addition for extensibility and a good first step towards the goal described in issue #3521.
However, I've found critical issues in the implementation of the new methods in both unsloth/models/llama.py and unsloth/models/loader.py. The method signatures are incorrect for the decorators used (@staticmethod and @classmethod), which will lead to TypeError exceptions at runtime. My review comments include specific suggestions to fix these issues.
Regarding the naming conventions for quantization functions you mentioned, that's a valid concern for a more general solution. I agree it could be addressed in a follow-up PR to keep this one focused.
Once the signature issues are addressed, this PR will be a solid improvement.
| class FastLlamaModel: | ||
|
|
||
| @staticmethod | ||
| def _prepare_for_qat(cls, model, qat_scheme): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of the _prepare_for_qat static method is incorrect. A staticmethod does not receive the class as an implicit first argument. The cls parameter will cause a TypeError at runtime because the method will be called with two arguments (model, qat_scheme), but it expects three. To fix this, the cls parameter should be removed.
| def _prepare_for_qat(cls, model, qat_scheme): | |
| def _prepare_for_qat(model, qat_scheme): |
| @classmethod | ||
| def _prepare_for_qat(model, qat_scheme): | ||
| model = _prepare_model_for_qat(model, qat_scheme) | ||
| return model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is decorated with @classmethod, but its signature (model, qat_scheme) doesn't account for the implicit cls argument that class methods receive, which will cause a TypeError. The first argument is also confusingly named model when it would receive the class object.
Since this method doesn't use the class object, and for consistency with FastLlamaModel, it should be a @staticmethod with the correct signature.
| @classmethod | |
| def _prepare_for_qat(model, qat_scheme): | |
| model = _prepare_model_for_qat(model, qat_scheme) | |
| return model | |
| @staticmethod | |
| def _prepare_for_qat(model, qat_scheme): | |
| model = _prepare_model_for_qat(model, qat_scheme) | |
| return model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to provide a way for users to customize the quantization process by exposing _prepare_for_qat as a replaceable method. This is a valuable addition for flexibility. However, I've found critical issues in the implementation of the new methods in both unsloth/models/llama.py and unsloth/models/loader.py. The method signatures are incorrect and will lead to TypeError exceptions at runtime. I have provided suggestions to fix these bugs. I've also pointed out a minor style issue regarding an unnecessary blank line.
| def _prepare_for_qat(cls, model, qat_scheme): | ||
| model = _prepare_model_for_qat(model, qat_scheme) | ||
| return model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature for the new static method _prepare_for_qat is incorrect. A static method does not receive the class as its first argument. The cls parameter will cause a TypeError at runtime because it will be populated with the model object from the call site, and the model parameter will receive qat_scheme, leaving the qat_scheme argument unfulfilled.
To fix this, you should remove the cls parameter from the method signature.
| def _prepare_for_qat(cls, model, qat_scheme): | |
| model = _prepare_model_for_qat(model, qat_scheme) | |
| return model | |
| def _prepare_for_qat(model, qat_scheme): | |
| model = _prepare_model_for_qat(model, qat_scheme) | |
| return model |
| @classmethod | ||
| def _prepare_for_qat(model, qat_scheme): | ||
| model = _prepare_model_for_qat(model, qat_scheme) | ||
| return model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new _prepare_for_qat method is decorated with @classmethod, but its signature is incorrect. The first argument to a class method is the class itself. Here, it's confusingly named model, and the actual model object is passed as the second argument, which is received by the qat_scheme parameter. This will lead to a TypeError at runtime.
For consistency with the changes in FastLlamaModel and to ensure correct behavior, this should be a staticmethod with the correct signature.
| @classmethod | |
| def _prepare_for_qat(model, qat_scheme): | |
| model = _prepare_model_for_qat(model, qat_scheme) | |
| return model | |
| @staticmethod | |
| def _prepare_for_qat(model, qat_scheme): | |
| model = _prepare_model_for_qat(model, qat_scheme) | |
| return model |
The goal of this PR is to allow any user to easily change how quantization is applied during the fine tuning process.
A longer description of the issue can be found in #3521
By making
_prepare_for_qata staticmethod, it is possible to replace it like so:As mentioned in the issue above, even allowing for a custom quantization function (and thus, custom quantizer like Brevitas), does not solve all the issues, since Unsloth makes a strong assumption about the names of the quantization functions for weights and activations.
This can be patched without necessarily changing anything within Unsloth, but I believe it might be worth thinking about a more general implementation for the naming conventions (either a task for this or another PR, you tell me).
cc @Datta0